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On the Structure of Real-Time Source Coders 
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The outputs of a discrete time source with memory are to be encoded 
("quantized" or "compressed") into a sequence of discrete variables. 
From this latter sequence, a receiver must attempt to approximate 
some features of the source sequence. Operation is in real time, and 
the distortion measure does not tolerate delays. Such a situation has 
been investigated over infinite time spans by B. McMillan. In the 
present work, only finite time spans are considered. The main result 
is the following. If the source is kth-order Markov, one may, without 
loss, assume that the encoder forms each output using only the last k 
source symbols and the present state of the receiver's memory. An 
example is constructed, which shows that the Markov property is 
essential. The case of delay is also considered. 

I. INTRODUCTION 

The outputs of a discrete time source with memory are to be encoded 
("quantized" or "compressed") into a sequence of discrete variables. 
From this latter sequence, a receiver must attempt to approximate 
some features of the source sequence. Operation is in real time, and 
the distortion measure does not tolerate delays. Such a situation has 
been investigated over infinite time spans in Ref. 1. In the present 
work, only finite time spans are considered. 

The main result is the following. If the source is Ath-order Markov, 
one may, without loss, assume that the encoder forms each output 
using only the last k source symbols and the present state of the 
receiver's memory. 

An example is constructed, which shows that the Markov property 
is essential. 

II. THE MODEL 

2. 1 The causal structure 

A source produces a random sequence X\, X2, • • • , Xt where for each 
t € {1, • • •, T), X, is a vector in n,-dimensional real space. The source 
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is characterized by the sequence distribution: A given probability 
measure on the Borel sets of the product space of dimension ^ji\n t . 

For each t, there is an opportunity for noiseless transmission of a 
signal Y t taking q t possible values. This signal is produced from the X 
sequence by an encoder. As we consider the problem in real time, 
causality allows the encoder at t to see only the values Xi, • • • , X t . The 
encoders are thus characterized by functions f t : R n,+ ' +n ' — > {1, • • •, 
qt] , Borel measurable, t = 1, ■ • • , T. 

At the receiving end, the most that could be accessible at stage t is 
the subsequence Y\, — , Y t . However, we also want to consider the 
case of limited memory, as the receiver might not be able to store this 
whole sequence for large t. The model will be the following. 

At t = 1, only Y\ is available, and a discrete variable Z\ = ri(Yi) 
taking m.\ values is stored in memory. For each t > 1, the memory is 
updated by 

Z t = r t (Z t - u Y,), t = 2,---,T-l, 

where Z t takes values in {1, • • • , m t } and 

r,: {1, • • .,.m,-,} X {1, . . ., q t ) -» {1, . . ., m t ] 

is the memory update function. 

The purpose of the receiver is to generate a variable V t in R"' by 

V i =g 1 (Yi), 

where g t : {1, •••,?!} -> R"\ 
and for t> 1 

V t = g t (Z t - h Yd, 

where 

gt. {1, • • -, m,_i} X {1, • • • q t } -> R*. 

The interpretation of V, is that it represents an approximation to 
something we wish to know at the receiving end about X t . In particular, 
one may have s t = n t and consider V t as approximating Xt itself. 

The functional relationships described above are symbolized in 
Fig. 1. 

The case of full receiver memory is included in this model. One need 
only identify Z t with (Yi, •••, Y t ) and n with the concatenation 
function "append." 

Furthermore, in this case, 

t 

nu = II 9a- 

*=1 
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2.2 The criteria 

The performance of the system is defined by way of a sequence of 
distortion functions. For each t, a Borel measurable function 

»//,: R"' x R 8 ' -» [0, oo) 

is given. Then 

J t - E{* t (X t , V t )) 

measures the distortion at stage t. It is possible that J, be infinite, but 
it is always well defined, as the composition of Borel functions is Borel 
and V, > 0. 

2.3 The optimization problem 

The problem to be considered is the following: 
Given: The integers: T; ti\, • • • , nr, qi, • • • , qr\ m u • • • , m,T\ Si, • • • , sr. 

The distribution of the X sequence. 

The distortion measures ^i, • • • , ¥r- 
Choose: The functions f\, ••-, fr, gu •■•,gr\ r h • • -, r T (the latter are 

redundant in the full memory case, i.e., when m, > 91*72 • • • qi for all 

t). A choice of a system of such functions will be called a "design." 

In order to: 
Minimize (exactly or within c) the sum 

T 




Fig. 1 — General system. 
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Remark that nothing would be gained by having </ as a nonnegative 
linear combination £ CtJt (for instance, with c, = e~ xt , a discount 
factor) because such c t > can simply be absorbed into the definition 
of*,. 

It should be said that the freedom of having n t , q t , m t , s t , ipt depend 
upon t is not a matter of extra generality, but is essential to the proof 
techniques used in the sequel. 

A design producing the values (J it • • •, Jt) is at least as good as a 
design producing (J{, • ■-,Jt) when J, < J', for all t e {1, • • •, T) . 
This, of course, implies the much weaker statement that J = J] Ji < e/' 

= £<//. 

A design may exist which is at least as good as any other; it is called 
a dominant design. In general, however, no dominant design exists 
because the set in R 1 of achievable vectors (J h • • • , Jt) does not have 
a corner (J*, • ■ •, Jt) such that all other points of this set lie in the 
shifted orthant J t > JT, t — 1, • • •, T. Instead, the set may have a 
Pareto frontier of "admissible" vectors, i.e., vectors (J\, • • • , Jt) such 
that no vector (J{, • •-, J' T ) is achievable that has Jl < J, for all t with 
strict inequality for some t. 

2.4 Special encoder structures 

The encoder f t at a specific stage t > k is said to have memory structure 
of order k, if there is a Borel function 

ft: (I, ...,m,-i} xi2"<-* + ' + - +n '->{l, ...,?,} 

such that 

fi(X\, • • •, X t ) = ft(Z t -\, Xt-k+u • • •, Xt) a.s. 

This is equivalent to the assertion that Y, is measurable on the a-field 
generated by Zt-i, Xt-k+i, • • • , X t . In other words, the encoder elaborates 
Yi using only the k most recent source outputs X-a+i, • • •, X, and the 
receiver's current memory Z t -\. 

III. THE MAIN THEOREM 

The sequence X\, X%, • • •, Xt is said to be ^th-order Markov, when, 
given any block of k consecutive X t , the parts of the sequence preceding 
and following this block are conditionally independent. For k = 1, this 
is the ordinary Markov property. Note that the ^th-order Markov 
property holds in a vacuous way if T < k + 2. 

Most sequences can be approximated by Ath-order Markov se- 
quences for sufficiently large k. If this k is small compared to T, then 
the following main theorem provides a substantial simplification of the 
encoder optimization problem. 

Theorem 1: Suppose the source is kth-order Markov. Then, given any 
design, there is another design with the following properties: 
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(i) The new design differs from the given one only in the choice of 
encoders. 

(ii) All encoders of the new design have memory structure of order 
k. {The last encoder f T can even be made to have memory structure of 
order 1.) 

(Hi) The performance index J of the new design does not exceed 
the index of the old design. 

Postponing the proof of Theorem 1 to Section V, we comment here 
on its significance. It says, in particular, that for a Markov source and 
a receiver with perfect memory, one need only consider encoders which 
generate each code symbol Y, using only the current source symbol X, 
and the past code sequence Yi, Y 2 , • • •, Y,-\. This result is essentially 
dependent on the Markov property of the source as can be seen from 
the following example. 

Take T = 3 and, for t - 1, 2, 3, let n, = s, = 1, q, = 2, MX,, V,) = {X, 
— V t ) 2 . Suppose the receiver has perfect memory. Suppose that the 
source sequence (X u X 2 , Xs) takes just eight equally probable values, 
namely (13, 1, 3), (12, 1, 2), (11, 1, 1), (10, 1, 0), (-10, -1, 0), (-11, -1, 
1), (-12, -1, 2), (-13, -1, 3). 

At the first stage, if one considers only the minimization of J\, one 
has a classical quantization problem for X\. As X\ takes its values in 
two separate equiprobable clusters, the minimum of J\ is attained by 
letting Y\ signal the sign of X } to identify the cluster. Then V\ = ±11.5 
and J\ = 1.25. Any other choice of the first encoder yields a strictly 
larger value for J\. Furthermore, Y\ is already sufficient for the 
attainment of J 2 = 0, the second-stage receiver need not even look at 
Y 2 . However, Y 2 can be used to help the third-stage receiver. If one 
lets Y 2 signal the parity of X u then J t = is attainable by letting Y 3 
signal whether | X :i | < 1 or not. 

The design so obtained minimizes J t for each t (it is "dominant"); a 
fortiori, it minimizes J - 3 Jt > &™8 J = l - 25 - However, the second- 
stage encoder does not have memory structure of order one. 

Is it possible to achieve J = 1.25 with memory structure of order one 
although the source is not Markov? The answer is no for, if one 
changes the first-stage encoder, this alone will drive J\ and, a fortiori, 
J above 1.25. But if the first encoder signals the sign of X\ and the 
second encoder must have first-order structure, then the second en- 
coder is useless. Indeed, X 2 contains no information not already con- 
tained in Yi, and the receiver remembers Y u Now Y\ is useless to the 
third-stage receiver,! and a single binary signal Y 3 is insufficient to 
distinguish among the four possible values of X :i . The best that can be 



t Yi = sgn Xi and X.\ are independent. 
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done is to form Y 3 as in the previous design, giving J 3 = 0.25; hence, 

J= 1.5. 

The optimum design requires encoder f 2 to "signal ahead" features 

of Xi for the later benefit of receiver g 3 . This phenomenon is ruled out 

for sources with the Markov property. 

IV. TWO BASIC LEMMATA 

All the results in this paper will be derived from two basic lemmata: 
a "two-stage lemma" and a more complex "three-stage lemma." Once 
these are obtained, the use of induction and of the technique of 
"repackaging" random variables will suffice. 

4. 1 The two-stage lemma 

This lemma uses what is, in fact, the basic line of reasoning in Ref. 
1. Consider a system with T = 2 and any joint distribution of the pair 
of random vectors (Xi, X 2 ). Observe that the content Z\ of the receiver's 
memory at the beginning of stage 2 is a certain function of X\\ that is, 

Zx = 4>(Xi), 

where <j> is a Borel function (in fact, it is the composition of /i and r t ). 
The second (and last) stage is characterized by the functions f 2 and g 2 
with (Fig. 2) 

Y 2 = fi{Xi, X 2 ), 

V 2 = g 2 (Z h Y 2 ). 

Lemma 1: Given a two-stage system with a design in which f 2 does 
not have memory structure of order 1, one can change f 2 (and only f 2 ) 
so that it has this structure and the new design is at least as good as 
the given design. 

Proof: If only f 2 is changed, then J\, <f>, and g 2 remain as given. We 
have to show that, for a suitable change in f 2 , J 2 can only decrease. 
Consider the function 

F((Z h X 2 ), Y 2 ) s MX*,g2(Z u Y 2 )). 

As Y 2 is discrete and F is measurable (by its construction), a measur- 
able function f 2 exists (see the appendix) such that 

F((Z U X 2 ), f 2 (Z u X 2 )) < F((Z lt X 2 ), Y 2 ) 

for all values of Z\, X 2 , Y 2 . Hence, by the substitution 

Zj = <k*i) 

Y 2 = f 2 (Xi, X 2 ) 

MX2, # 2 (<M*i), M<t>(Xi), x 2 ))) < t 2 {x 2 , g2(${x x ), f 2 (x h x 2 )) 
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x, x 2 

<: \ 

f 2 

| | 



Fig. 2 — Two-stage lemma. 

holds for all X u X 2 . As the functions <f> h f 2 and f 2 are measurable, both 
sides of the inequality are measurable. Since they are also nonnegative, 
the inequality persists when taking the expectation of both sides, 
whether finite or not. This establishes that J 2 can only decrease as 
claimed. 
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Fig. 3— Three-stage lemma. 
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4.2 The three-stage lemma 

Consider a three-stage system (T= 3) with a Markov source. Assume 
that the last encoder fa already has first-order memory structure, while 
f 2 does not (Fig. 3). 

Lemma 2: Under the above assumptions, one can replace f 2 by an 
encoder f 2 having memory structure of order one, without increasing 
the total cost J = J\ + J 2 + Ja. 

Proof: The first-stage cost J\ is unaffected by changes in f 2 and the 
effect of the first-stage design is to generate the receiver memory Z\ as 
a certain measurable function Z\ = <f>(Xi), where <j> is the composition 
r\°f. By assumption, fa can be written in the form 

Ya = fa{Z 2 , Xa) 

where Z 2 = r 2 (Z u Y 2 ). 

The cost incurred in the last two stages can thus be written 

HX 2 , g 2 (Z lt Y 2 )) 

+ MX,, ga(r 2 (Z h Y 2 ),f 3 (r 2 (Z h Y 2 ),Xa))) 

— F{Z\, X 2 , Xa, Y 2 ), 

defining the measurable function F. 
Consider now the conditional expectation 

E{F(Z l ,X 2 ,Xa,Y 2 )\X 1 ,X 2 ). 

Because Xa is a finite dimensional random vector, a regular condi- 
tional distribution of X 3 exists for any condition. In view of the Markov 
property, conditioning on the pair (X\, X 2 ) is equivalent to conditioning 
on X 2 only. Let P(dX :i \ X 2 ) be a regular version of this conditional 
distribution. 

Then the conditional expectation under consideration can be written 

P(dXa\X 2 )F(Z 1 ,X 2 ,X 3 , Y 2 ), 

where Z\ and Y 2 , which depend only on the conditioning variables X\, 
X 2 , can be treated as fixed. This integral defines a measurable function 
(nonnegative and possibly extended real-valued) 

G(Z h X 2 , Y 2 ). 

For any choice of f 2 , the sum J 2 + J 3 will be given by the expectation 
of G. Note that Xi enters G only by way of Z\ and Y 2 . 

As in Lemma 1, a measurable function f 2 exists such that, for all Z u 
X 2 and Y 2 , 

G{Z h X 2 , f 2 {Z h X 2 )) < G{Z h X 2 , Y 2 ), 
1444 THE BELL SYSTEM TECHNICAL JOURNAL, JULY-AUGUST 1979 



Substituting Z\ = <j>(X\), Yo = fa(X\ t Xo) and taking the expectations of 
both sides of this inequality, implies, by the chain rule, that Ji + Jz 
cannot increase when fi is replaced by f 2 . 

V. PROOF OF THE MAIN THEOREM 

To begin with, the situation of the last stage is always a special one, 
as the following lemma shows. 

Lemma 3: For any source statistics and any design, one can replace 
the last encoder by one having memory structure of order one, without 
performance loss. 

Proof: The given T-stage system can be considered as a two-stage 
system, by setting 

X\ = (X\, X-2, • • •, Xt-\) 

X2 = X T 

Z\ = Zt-\ = <t>(X\) 

Yi = Yt 
fz{X\, X2) = fri,X\, X2, • • • , Xt-u Xt) 
&(2i, %) = griZr-x, Y T ) 

V X = (V U .-., Vr-i) 

V 2 = V T 

fi&u V,) = x ux„ V t ) 

l-\ 
^2(^2, Vo) = \pr(Xr, Vt), 
which amounts to a change in notation. Of course, 

T-\ 

n\ — J n h 

a substantial increase in dimension. 

By Lemma 1, there exists an encoder fa which has the structure 

12 = /2 {Z\, X2) 

and whose use does not increase J 2 . Reverting to the original notation, 
this corresponds to an encoder jt with the structure 

Yt = it(Zt-\, Xt) 

whose use does not increase Jt- As the other J, are unchanged, the 
lemma is proved. 
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The above fact is the starting point for the proof of the main theorem 
with k = 1. 

Lemma 4: The main theorem holds for k = 1. That is, for a Markov 
source and any design, one can replace the encoders by appropriate 
encoders having first-order memory structure without increase in the 
expected cost J. 

Proof: Using backward induction, one can first replace fr by fr, as in 
Lemma 3. Now suppose the encoders for stages t + 1, t + 2, • • • , T 
already have memory structure of order one. It must be shown that f t 
can be replaced by /, with such structure, without increase in expected 
total cost. To this effect, the T-stage system can be considered as a 
three-stage system, in which the third stage has first-order memory 
structure and the source is Markov, as follows. Set 

/ '-' 

Xi = (X u • • •, X t -i) I thus, /ii = X n, 

Z\ = Zt-\ = <j>(X\) 
Xi = Xt 
Y 2 = Y, 

Z 2 = Z, = f 2 {Z u Y 2 ) = r,(Z t - u Y<) 
V 2 =V, = g 2 (Z u Y 2 ) = g t (Z t - u Y t ) 
U&, V 2 ) = UX„ V t ) 

I - T 
Xs = (X t +i, • • •, Xt), I h~a = 2 n ' 



Y;<=(Y l+h .-.,Y T ), (ft- ft Qi 



% = (V t+i , . ... V t ) - g 3 (Z 2 , f 3 ), (§b = 2 s\ 

The latter relation follows from the fact that each Vg, 0>t,is given by 
gg(Z g -i, Y e ) and the variables Z B -\, Y g are known functions of Z t , Y t +i, 
Y l+2 , — , Yt using the memory update functions. Then 

T 

Ux 3 , Vs) = I MX* v B ). 

g=t+i 

As the encoders for stages t + 1, • ■ • , T already have first-order 
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memory structure, their effect is to define an encoder 

y 3 = Uz 2 , &) 

because each of the Ye, > t, included in Y 3 is given by a function 
feiZe-u Xo) and the variables Z e -\, X e are known functions of Z 2 , X 3 ; i.e., 
of Z,, Xt+i, • • •, X T using the memory update functions and recursion. 
The given encoder at stage t has the general form Y, = ft(X h X 2 , • • • , 
Xt-u X t ) which translates to 

y 2 = / 2 (Xi,X 2 ). 

The new source (Jti, X 2 , £3) is Markov since Xi = (X u ■•-, X t -i) and 
X 3 = (Xt+i, • • • X T ) are conditionally independent given X 2 = X,, by the 
assumed Markov property of the original source. 

Thus, the three-stage system satisfies the assumptions of Lemma 2, 
and f 2 can be replaced without loss of total expected cost by/ 2 , which 
has the structure 

Y 2 = f 2 (Zi,X 2 ). 

This translates to an encoder 

Y, = /,(Z,-,, X,) 

for the original problem. Since the notational changes do not influence 
total cost, the inductive step, and therefore the lemma, is proved. 

Note that the above induction is carried out down to t = 2 because 
f x cannot help but have the desired structure, albeit trivially so. 

Turning to the case of general k, observe that the encoders for the 
first k stages have memory structure of order k in a trivial way, 
whatever their design, and for the last stage, Lemma 3 applies. Thus 
the conclusion of the main theorem holds for T < k + 1 trivially, as 
does the assumption on the source. Hence, assume T > k + 2. 

The essence of the proof is a "sliding block" repackaging of the 
source variables. 
Let X t = (X h Xt+i, • • • , Xt+k-i) 

for *=i,...,f where ?=T-A + 1>3. 

Then the sequence (X, X 2 , • • • , Xf) is Markov. For the variables, let 

?i-(Yi, ■••, Ya), 

Y,= Y l+k -i, fort = 2,-->,T 

Z, = Z, +k - h fort=l,---,T 

V, = (Vi, ..., V k ), 
and V, = V t+ k-i, for t = 2, • • •, t. 
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The functions relating these variables are written as follows: 

?i-A(*i) 

summarizes the action of the first k encoders, which will remain 
unchanged as they already (trivially) have memory structure of order 
k. For t > 1, 

it = ft(X\, X2 • • •, At) = ft+k-\(X\, • ■ •, X t +k-\) 

where /, is not uniquely defined by this relation. It can be made unique 
and measurable by requiring (for example) that for 6 = 2, • • • , t, the 
function // depends on the argument^ = (Xe, - • • , X$+k-i ) only through 
its last component Xe+k-u However, any measurable /, satisfying the 
identity is acceptable. 

The receivers are characterized by their memory updating functions: 

Zi = n(Yi) 

summarizes the recursive buildup of Zk from (Y iy •••, Yk) using 
n, • • •, r/,. For t > 2, f t is defined by 

Zi = fi(Zt-\, Yi) = rt+k-\(Zi+k-2, Yt+k-\)- 
Likewise, 

vw,(?i) 

summarizes the action of the first k decoders (including their memory 
updating). For t>2,g, is defined by 

V, = g t (Z,- u f t ) = g, +k -x{Z t+k -2, Y, +k -i). 

Finally, ^{X h V x ) = S*-i Wk v >) and fort>2 

4>t(x t , v t ) = tyt+k-ii.Xt+k-1, Vf+ik-i), 

where \p t depends on argument X, only through the component X t+k -\. 
Now Lemma 4 can be applied to this T stage system with Markov 
source. Without increase in total cost, the encoders / 2 , • • • , /r can be 
replaced by encoders /, with first-order memory structure, i.e., 

?«-/*(&-!,.&), for* =2, ...,T. 

Expressing this in terms of the original variables, the functions /, for 
t = k + 1, • • • , T are replaced by functions f, satisfying 

Yt+k-\ = ft+k-i(Z t +k-2, Xt, X l+ i, • • • Xt+k-\) for t = 2, • • •, T — k + 1 

or equivalently 

Y, = /,(Z,_,, X,- k+h -^Xt) for t = k + 1, • . ., T. 
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These /, exhibit memory structure of order k, so that the main theorem 
is proved. 

VI. THE CASE OF DELAYED DISTORTION MEASURES 

The basic model of Section II can be modified as follows for the case 
in which a delay of 8 > steps is included in the definition of distortion. 
The first change is that the variables Vi, •••, V s are simply not 
generated, the receiver spends its first 8 periods just accumulating 
observations of Yi, • • • , Y s and updating its memory accordingly. 

The second change is that, for t > 8, distortion is measured by a 
function \p t (X,- S , V,) whose expectation defines J,. The design objective 
is to minimize 

T 

J= 2 J,. 

For this situation, the following structure simplifying result holds. 

Delay Theorem: Suppose that the source is £th order Markov and 
that the distortion is defined with delay 8. Then any given design can 
be replaced, without loss, by one in which the encoders have memory 
structure of order max(k, 5+1). 

Proof: In case k > 8 + 1, one can perform the same transformation of 
the point of view as in the proof of the main theorem. Indeed, this 
transformation gives cost functions of the form 

xf,(X,, V,), t=l,--.,T-k+l, 

where 

Z = (X,, ■ • -, X l+k -,), V, = (Vi, . . ., Vm), V, = Vi + *-i. 

This is compatible with the delay criterion, as follows: 

Ux„ ft) = 2 MX,, v t+s ) 
1=1 

and for t = 2, • • • , T - k + 1 

$t(Xt, V t ) = h+k-iiXt+ks-u V,+k-i), 

where it happens that \pt depends upon X, only through the component 
Xi+ks-i- 

Therefore the argument of the main theorem applies: One can use 
encoders with memory structure of order k. In fact, the above shows 
that this conclusion is valid for any criteria of the form 

Yj tytiXt, x,+\, • • •, x,+k-i, v,+k-i). 

t 
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As for the case k < 8 + 1, observe that the source is then, a fortiori, 
Markov of order 6 + 1. Hence, the first case applies to yield memory 
structure of order 8 + 1, as claimed. 

VII. CONCLUDING REMARKS 

A few extensions of the results are of interest. 

(i) The proof of the three-stage lemma goes through under the 
weaker assumption that f 3 depends upon Z 2 , X2, and X 3 . 

(ii) All the results in this paper remain true for V, restricted to given 
subsets of R s '. This would correspond to quantization levels fixed in 
advance, as opposed to their selection as part of the design. 

(Hi) Suppose 5 = 0, k = 1, and the encoder is restricted a priori to 
be a finite state machine of the type 

W, = h t (W t - u X t ), 

Y t -f t (Wt- h X t ), 

where W t is a discrete variable representing the contents of the 
encoder's memory. Then the main theorem implies that it is optimal 
to take Z t = W t and h t = r, since this simulation of the receiver's 
memory produces the argument required for the generation of Y t . This 
result was obtained independently by N. T. Gaarder. 
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APPENDIX 

Let X be a set and 38 a o- algebra of subsets of X Let Y be a finite set 
(1, ■ • • , q] . A function F: X X Y — > R is called measurable if for each 
y in Y, the function F(-, y): X— > ^ is ^-measurable. 

Then it follows that a function /: X — > Y exists such that 

F(x,f(x))^F(x,y) 

holds for all x G X and y E. Y and the function /is ^-measurable (which 
means that [x | f(x) = y) is in 38 for each y). 

Since y takes only finitely many values, it is evident that, for each x, 
one can select an f(x) to satisfy the inequality. However, there may be 
many x for which the minimizing y is not unique. This creates the need 
for a choice of values in defining f, and if such a choice were made in 
a totally arbitrary manner, it is possible that the resulting f not be im- 
measurable. What is needed is the (elementary) proof that, for a 
reasonable way to resolve ambiguous choices, the resulting / is auto- 
matically ^-measurable. 
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Given jy E Y, consider the set A y of all x for which y is among the 
minimizing values, this set is measurable because it is defined by a 
finite number of inequalities among measurable functions, namely, for 
each y' G Y, 

F(x,y)<F(x,y'). 

The sets A y cover X but with overlaps. To remove the overlaps, use 
the numerical indexing of Y to define 



and, for y > 1, 



By=Ay~ U Ai. 

1=1 



This construction preserves measurability and removes overlap. Thus, 
if /is defined to take value y on By, the desired result is attained. This 
amounts to stipulating that, when the minimum is attained for more 
than one element of Y, f{x) is defined as the element with the smallest 
label. 
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